{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9549c364-20c5-4adb-8a57-5bd08ffd1d3c",
   "metadata": {},
   "source": [
    "# 赛题介绍\n",
    "\n",
    "设法提高用户观看体验，是芒果TV平台的核心技术挑战之一，我们不断为此而努力！及时发现用户的兴趣和调整内容展示对于实现这一目标非常重要，在给定用户观影历史和上下文的行为的条件下，进行序列预测是一项非常重要且困难的推荐任务，本赛题将以此为背景，希望选手在真实样本数据集下建立出最优的序列预测模型。\n",
    "\n",
    "## 赛题任务\n",
    "\n",
    "根据大赛组织方提供的用户在芒果TV产生的行为，视频特征，标签特征等数据，构建模型，预测用户在芒果TV下一个时刻观看的视频。\n",
    "\n",
    "## 数据说明\n",
    "\n",
    "### 用户视频观看行为\n",
    "\n",
    "\n",
    "## 评估指标\n",
    "\n",
    "1. 评估指标是[MRR](https://en.wikipedia.org/wiki/Mean_reciprocal_rank)，将用户实际观看的视频排在更靠前的位置，将得到更高的分数。 \n",
    "2. 算力要求 \n",
    "    - 物理内存不超过256G，显存使用不超过11G。 \n",
    "    - 代码运行时间不超过24小时。未满足以上算力限制的参赛队伍，大赛官方有权将最终总成绩判定无效，排名由后一名依次递补。\n",
    "\n",
    "## 结果提交\n",
    "\n",
    "每个did提交用户最有可能观看的6个视频，从高到低进行排序 \n",
    "选手提交答案样例\n",
    "\n",
    "| did  | vid   | rank |\n",
    "| ---- | ----- | ---- |\n",
    "| Did1 | Vid1  | 1    |\n",
    "| Did1 | Vid3  | 2    |\n",
    "| Did1 | Vid4  | 3    |\n",
    "| Did1 | Vid7  | 4    |\n",
    "| Did1 | Vid2  | 5    |\n",
    "| Did1 | Vid5  | 6    |\n",
    "| Did2 | Vid6  | 1    |\n",
    "| Did2 | Vid7  | 2    |\n",
    "| Did2 | Vid10 | 3    |\n",
    "| Did2 | Vid32 | 4    |\n",
    "| Did2 | Vid8  | 5    |\n",
    "| Did2 | Vid9  | 6    |\n",
    "\n",
    "请选手将测试集的预测结果按照上述文件样式进行提交上传，文件格式为csv，did列表示为用户，vid列表示用户下一个时刻观看的视频, rank是排序位置\n",
    "\n",
    "## 比赛规则\n",
    "\n",
    "1. 每支队伍每天限提交3次结果； \n",
    "2. 比赛排行按照B榜最终得分从高到低排序（如出现相同分数，按照推理耗时从低到高排序）； \n",
    "3. B榜赛程结束后，排名前10的队伍需要提交方案设计说明书、算法源代码。大赛官方组织专家进行审核和必要的沟通询问。 \n",
    "4. 为防止作弊行为，主办方保留了部分未公开数据用于校验方案，如遇作弊行为，将取消选手成绩。\n",
    "\n",
    "## 奖项设置\n",
    "\n",
    "- 冠军：人民币240,000元\n",
    "- 亚军：人民币60,000元\n",
    "- 季军：人民币27,000元\n",
    "- 排名第4至10名：人民币5,000元\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9eb83efb-294c-411f-b265-3f3901815bea",
   "metadata": {},
   "source": [
    "# 数据读取"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ee0db519-0883-4bfd-93aa-4f1d43d0584a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-02T01:20:06.441537Z",
     "iopub.status.busy": "2022-07-02T01:20:06.440960Z",
     "iopub.status.idle": "2022-07-02T01:20:06.573095Z",
     "shell.execute_reply": "2022-07-02T01:20:06.571594Z",
     "shell.execute_reply.started": "2022-07-02T01:20:06.441488Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "总用量 552M\n",
      "-rw-r--r-- 1 lyz lyz 433K 6月  20 11:21 candidate_items_A.csv\n",
      "-rw-r--r-- 1 lyz lyz 507M 6月  20 11:19 main_vv_seq_train.csv\n",
      "-rw-r--r-- 1 lyz lyz  44M 6月  20 11:18 vid_info.csv\n"
     ]
    }
   ],
   "source": [
    "!ls ./data_v2/ -lh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8930f9e1-b454-4fc7-852d-483af2d401b4",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-15T11:11:18.306034Z",
     "iopub.status.busy": "2022-07-15T11:11:18.305457Z",
     "iopub.status.idle": "2022-07-15T11:11:31.586579Z",
     "shell.execute_reply": "2022-07-15T11:11:31.586079Z",
     "shell.execute_reply.started": "2022-07-15T11:11:18.305985Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Populating the interactive namespace from numpy and matplotlib\n"
     ]
    }
   ],
   "source": [
    "%pylab inline\n",
    "\n",
    "import os\n",
    "import sys\n",
    "import codecs\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from tqdm import tqdm\n",
    "\n",
    "\n",
    "def read_data(path='data_v2/'):\n",
    "    vid_info = pd.read_csv(os.path.join(path, 'vid_info.csv'))\n",
    "    vid_info['stars'] = vid_info['stars'].apply(eval)\n",
    "    vid_info['tags'] = vid_info['tags'].apply(eval)\n",
    "    vid_info['key_word'] = vid_info['key_word'].apply(eval)\n",
    "    vid_info.sort_values(by=['cid', 'serialno'], inplace=True)\n",
    "    # vid_info.set_index('vid', inplace=True)\n",
    "\n",
    "    seq_train = pd.read_csv(os.path.join(path, 'main_vv_seq_train.csv'))\n",
    "    seq_train = seq_train.sort_values(by=['did', 'seq_no'])\n",
    "    seq_train.reset_index(inplace=True)\n",
    "\n",
    "    candidate_items = pd.read_csv(os.path.join(path, 'candidate_items_A.csv'))\n",
    "    return vid_info, seq_train, candidate_items\n",
    "\n",
    "\n",
    "vid_info, seq_train, candidate_items = read_data()\n",
    "\n",
    "seq_vid_train = pd.merge(seq_train, vid_info, on='vid', how='left')\n",
    "seq_vid_train['vts'] /= seq_vid_train['duration']\n",
    "seq_vid_train['hb'] /= seq_vid_train['duration']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "30ba6807-0337-4fbe-b619-0d33a3e3efc9",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:08:19.210091Z",
     "iopub.status.busy": "2022-07-13T08:08:19.209866Z",
     "iopub.status.idle": "2022-07-13T08:08:19.221494Z",
     "shell.execute_reply": "2022-07-13T08:08:19.221089Z",
     "shell.execute_reply.started": "2022-07-13T08:08:19.210075Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>vid</th>\n",
       "      <th>cid</th>\n",
       "      <th>is_intact</th>\n",
       "      <th>serialno</th>\n",
       "      <th>classify_id</th>\n",
       "      <th>series_id</th>\n",
       "      <th>duration</th>\n",
       "      <th>title_length</th>\n",
       "      <th>upgc_flag</th>\n",
       "      <th>stars</th>\n",
       "      <th>tags</th>\n",
       "      <th>key_word</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>30044</th>\n",
       "      <td>49ce55753d994b17146098553a787982</td>\n",
       "      <td>00049722020c746dbc9164e52a7833b4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>50</td>\n",
       "      <td>0</td>\n",
       "      <td>345</td>\n",
       "      <td>38</td>\n",
       "      <td>0</td>\n",
       "      <td>[101031605, 101031603]</td>\n",
       "      <td>[105013965]</td>\n",
       "      <td>[189165, 369343, 747602, 407197, 865416, 24194...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30045</th>\n",
       "      <td>2773a834f086b3ce6eca919f27211752</td>\n",
       "      <td>00049722020c746dbc9164e52a7833b4</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>0</td>\n",
       "      <td>214</td>\n",
       "      <td>38</td>\n",
       "      <td>0</td>\n",
       "      <td>[101031605, 101031603]</td>\n",
       "      <td>[105013965]</td>\n",
       "      <td>[189165, 369343, 747602, 407197, 865416, 24194...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                    vid                               cid  \\\n",
       "30044  49ce55753d994b17146098553a787982  00049722020c746dbc9164e52a7833b4   \n",
       "30045  2773a834f086b3ce6eca919f27211752  00049722020c746dbc9164e52a7833b4   \n",
       "\n",
       "       is_intact  serialno  classify_id  series_id  duration  title_length  \\\n",
       "30044          1         1           50          0       345            38   \n",
       "30045          1         2           50          0       214            38   \n",
       "\n",
       "       upgc_flag                   stars         tags  \\\n",
       "30044          0  [101031605, 101031603]  [105013965]   \n",
       "30045          0  [101031605, 101031603]  [105013965]   \n",
       "\n",
       "                                                key_word  \n",
       "30044  [189165, 369343, 747602, 407197, 865416, 24194...  \n",
       "30045  [189165, 369343, 747602, 407197, 865416, 24194...  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vid_info.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5d901468-775e-4530-8f1b-ec7e69453393",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:08:19.222467Z",
     "iopub.status.busy": "2022-07-13T08:08:19.222301Z",
     "iopub.status.idle": "2022-07-13T08:08:19.293402Z",
     "shell.execute_reply": "2022-07-13T08:08:19.292926Z",
     "shell.execute_reply.started": "2022-07-13T08:08:19.222452Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>did</th>\n",
       "      <th>vid</th>\n",
       "      <th>vts</th>\n",
       "      <th>hb</th>\n",
       "      <th>seq_no</th>\n",
       "      <th>cpn</th>\n",
       "      <th>fpn</th>\n",
       "      <th>time_gap</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2696560</td>\n",
       "      <td>0000d0aabe8c188f88c756ce0f7f9639</td>\n",
       "      <td>f87cf2ad695b4bb5dd830ae40bf29475</td>\n",
       "      <td>44.0</td>\n",
       "      <td>3286.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>130</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2696559</td>\n",
       "      <td>0000d0aabe8c188f88c756ce0f7f9639</td>\n",
       "      <td>fde2b2a62fb6061e4a958fb0b78c0293</td>\n",
       "      <td>1260.0</td>\n",
       "      <td>3698.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>130</td>\n",
       "      <td>108.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     index                               did  \\\n",
       "0  2696560  0000d0aabe8c188f88c756ce0f7f9639   \n",
       "1  2696559  0000d0aabe8c188f88c756ce0f7f9639   \n",
       "\n",
       "                                vid     vts      hb  seq_no  cpn  fpn  \\\n",
       "0  f87cf2ad695b4bb5dd830ae40bf29475    44.0  3286.0     1.0    1  130   \n",
       "1  fde2b2a62fb6061e4a958fb0b78c0293  1260.0  3698.0     2.0    1  130   \n",
       "\n",
       "   time_gap  \n",
       "0       NaN  \n",
       "1     108.0  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "seq_train.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "cee62a4e-df4e-4276-abdb-f94cd2b35d25",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:08:19.294575Z",
     "iopub.status.busy": "2022-07-13T08:08:19.294321Z",
     "iopub.status.idle": "2022-07-13T08:08:19.364725Z",
     "shell.execute_reply": "2022-07-13T08:08:19.364006Z",
     "shell.execute_reply.started": "2022-07-13T08:08:19.294557Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>vid</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5bc78a50602b520bb3f6c87e3c542f1c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>73e37445d73561ffdf0711a3ffe4ca25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                vid\n",
       "0  5bc78a50602b520bb3f6c87e3c542f1c\n",
       "1  73e37445d73561ffdf0711a3ffe4ca25"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "candidate_items.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "046f114a-048a-40b6-ba84-4f78f4df37cd",
   "metadata": {},
   "source": [
    "# 数据分析"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6aac63da-a25d-46e6-bcec-762a9ca7da58",
   "metadata": {},
   "source": [
    "## 视频信息表\n",
    "\n",
    "| 字段        | 说明           | 示例                                                  | 备注                          |\n",
    "| ----------- | -------------- | ----------------------------------------------------- | ----------------------------- |\n",
    "| vid         | 视频id         | 239fa8776d81792a7477053184e7d6af                      | string                        |\n",
    "| cid         | 合集id         | 9f84a64963ac7f263f56168b9b0819dc                      | string                        |\n",
    "| Is_intact   | 视频类型       | 1                                                     | int                           |\n",
    "| serialno    | 集号           | 1                                                     | int                           |\n",
    "| series_id   | 系列id         | 0                                                     | int                           |\n",
    "| duration    | 视频时长       | 1200                                                  | int                           |\n",
    "| title_len   | 标题长度       | 30                                                    | int                           |\n",
    "| upgc_flag   | upgc标识       | 0/1                                                   | int                           |\n",
    "| classify_id | 频道类型       | 3                                                     | int(表示电影，电视剧，动漫等) |\n",
    "| stars       | 包含的明星集合 | [101000001, 101000002, 101000005, 101000004,........] | array                         |\n",
    "| keywords    | 关键词         | [0,4,5,78,9]                                          | array                         |\n",
    "| tags        | 视频标签集合   | [201000001, 301000002, 401000005, 201000004,........] | array      "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c447442f-e1a1-48b2-b3cf-ae0849457131",
   "metadata": {},
   "source": [
    "## 用户视频观看行为\n",
    "\n",
    "| 字段     | 说明                          | 示例                             | 数据类型 |\n",
    "| -------- | ----------------------------- | -------------------------------- | -------- |\n",
    "| did      | 用户设备id                    | e55947fb8d27288721bbaaf90aea4766 | string   |\n",
    "| vid      | 视频id                        | 239fa8776d81792a7477053184e7d6af | string   |\n",
    "| vts      | 用户观看vid的播放时长         | 20                               | int      |\n",
    "| hb       | 用户最后一次观看vid的进度时长 | 50                               | int      |\n",
    "| seq_no   | 用户观看视频行为的序列号      | 1                                | int      |\n",
    "| time_gap | 距离上一次观看视频的时间间隔  | 30                               | int      |\n",
    "| cpn      | 当前页面类型                  | 12                               | int      |\n",
    "| fpn      | 上一次跳转页面类型            | 1                                | int      |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "13a8db60-e7e9-4c0f-b363-33470969e7c9",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:08:57.411511Z",
     "iopub.status.busy": "2022-07-13T08:08:57.410902Z",
     "iopub.status.idle": "2022-07-13T08:08:58.060470Z",
     "shell.execute_reply": "2022-07-13T08:08:58.059781Z",
     "shell.execute_reply.started": "2022-07-13T08:08:57.411460Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:ylabel='Frequency'>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAD4CAYAAAAtrdtxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAX+UlEQVR4nO3dfZBddZ3n8ffH8OgjIC1LJTDBMTVudDRiC7F0ahVLCLAjuOu4sDNDyqKMW0KV1lq7Bndq8IkqrNqRkSmlxCFLcB0Rn4YsxM1EpGbKP3gIEoGALC3GJTGSSHjQ0YWB+e4f9xe8E7qTy0nf2+n0+1V1qs/5nt859/drGj6cc3733lQVkiR18YKZ7oAkafYyRCRJnRkikqTODBFJUmeGiCSps4NmugOjdvTRR9fChQtnuhuSNKvccccdv6iqsd3rcy5EFi5cyIYNG2a6G5I0qyT56WR1b2dJkjobWogkOSzJbUl+mGRTkk+0+tVJfpJkY1uWtHqSXJ5kIsldSU7sO9fyJA+0ZXlf/Y1J7m7HXJ4kwxqPJOm5hnk760nglKr6VZKDge8n+U7b91+q6hu7tT8dWNSWk4ErgJOTHAVcDIwDBdyRZE1VPdravB+4FVgLLAO+gyRpJIZ2JVI9v2qbB7dlT5+xchZwTTvuFuCIJMcCpwHrq2pnC471wLK276VVdUv1PrvlGuDsYY1HkvRcQ30mkmReko3AdnpBcGvbdUm7ZXVZkkNbbT7wUN/hW1ptT/Utk9Qn68eKJBuSbNixY8e+DkuS1Aw1RKrqmapaAiwATkryWuAi4NXAm4CjgI8Osw+tH1dW1XhVjY+NPWeGmiSpo5HMzqqqx4CbgWVVta3dsnoS+B/ASa3ZVuC4vsMWtNqe6gsmqUuSRmSYs7PGkhzR1g8H3gn8qD3LoM2kOhu4px2yBjivzdJaCjxeVduAdcCpSY5MciRwKrCu7XsiydJ2rvOA64c1HknScw1zdtaxwOok8+iF1XVVdUOS7yUZAwJsBP5Ta78WOAOYAH4NvA+gqnYm+RRwe2v3yara2dY/CFwNHE5vVpYzsyRphDLXvpRqfHy8ur5jfeHKG59d33zpmdPVJUna7yW5o6rGd6/7jnVJUmeGiCSpM0NEktSZISJJ6swQkSR1ZohIkjozRCRJnRkikqTODBFJUmeGiCSpM0NEktSZISJJ6swQkSR1ZohIkjozRCRJnRkikqTODBFJUmeGiCSpM0NEktSZISJJ6swQkSR1NrQQSXJYktuS/DDJpiSfaPUTktyaZCLJ15Ic0uqHtu2Jtn9h37kuavX7k5zWV1/WahNJVg5rLJKkyQ3zSuRJ4JSqej2wBFiWZCnwGeCyqnoV8Chwfmt/PvBoq1/W2pFkMXAO8BpgGfCFJPOSzAM+D5wOLAbObW0lSSMytBCpnl+1zYPbUsApwDdafTVwdls/q23T9r8jSVr92qp6sqp+AkwAJ7VloqoerKqngGtbW0nSiAz1mUi7YtgIbAfWAz8GHquqp1uTLcD8tj4feAig7X8ceHl/fbdjpqpP1o8VSTYk2bBjx45pGJkkCYYcIlX1TFUtARbQu3J49TBfbw/9uLKqxqtqfGxsbCa6IEkHpJHMzqqqx4CbgTcDRyQ5qO1aAGxt61uB4wDa/pcBj/TXdztmqrokaUSGOTtrLMkRbf1w4J3AffTC5D2t2XLg+ra+pm3T9n+vqqrVz2mzt04AFgG3AbcDi9psr0PoPXxfM6zxSJKe66C9N+nsWGB1m0X1AuC6qrohyb3AtUk+DdwJXNXaXwV8OckEsJNeKFBVm5JcB9wLPA1cUFXPACS5EFgHzANWVdWmIY5HkrSboYVIVd0FvGGS+oP0no/sXv9/wB9Nca5LgEsmqa8F1u5zZyVJnfiOdUlSZ4aIJKkzQ0SS1JkhIknqzBCRJHVmiEiSOjNEJEmdGSKSpM4MEUlSZ4aIJKkzQ0SS1JkhIknqzBCRJHVmiEiSOjNEJEmdGSKSpM4MEUlSZ4aIJKkzQ0SS1NnQvmP9QLdw5Y3Prm++9MwZ7IkkzRyvRCRJnQ0tRJIcl+TmJPcm2ZTkQ63+8SRbk2xsyxl9x1yUZCLJ/UlO66sva7WJJCv76ickubXVv5bkkGGNR5L0XMO8Enka+EhVLQaWAhckWdz2XVZVS9qyFqDtOwd4DbAM+EKSeUnmAZ8HTgcWA+f2necz7VyvAh4Fzh/ieCRJuxlaiFTVtqr6QVv/JXAfMH8Ph5wFXFtVT1bVT4AJ4KS2TFTVg1X1FHAtcFaSAKcA32jHrwbOHspgJEmTGskzkSQLgTcAt7bShUnuSrIqyZGtNh94qO+wLa02Vf3lwGNV9fRu9clef0WSDUk27NixYzqGJEliBCGS5MXAN4EPV9UTwBXA7wJLgG3AXwy7D1V1ZVWNV9X42NjYsF9OkuaMoU7xTXIwvQD5SlV9C6CqHu7b/yXghra5FTiu7/AFrcYU9UeAI5Ic1K5G+ttLkkZgmLOzAlwF3FdVn+2rH9vX7N3APW19DXBOkkOTnAAsAm4DbgcWtZlYh9B7+L6mqgq4GXhPO345cP2wxiNJeq5hXom8BfhT4O4kG1vtY/RmVy0BCtgMfACgqjYluQ64l97Mrguq6hmAJBcC64B5wKqq2tTO91Hg2iSfBu6kF1qSpBEZWohU1feBTLJr7R6OuQS4ZJL62smOq6oH6c3ekiTNAN+xLknqzBCRJHVmiEiSOjNEJEmdGSKSpM4MEUlSZ4aIJKkzQ0SS1JkhIknqzBCRJHVmiEiSOjNEJEmdGSKSpM4MEUlSZwOFSJLfH3ZHJEmzz6BXIl9IcluSDyZ52VB7JEmaNQYKkar6A+CP6X3X+R1J/ibJO4faM0nSfm/gZyJV9QDwZ/S+kvbfAJcn+VGSfzeszkmS9m+DPhN5XZLLgPuAU4A/rKp/3dYvG2L/JEn7sUG/Y/2vgL8GPlZVv9lVrKqfJfmzofRMkrTfGzREzgR+U1XPACR5AXBYVf26qr48tN5JkvZrgz4T+S5weN/2C1ttSkmOS3JzknuTbEryoVY/Ksn6JA+0n0e2epJcnmQiyV1JTuw71/LW/oEky/vqb0xydzvm8iQZdOCSpH03aIgcVlW/2rXR1l+4l2OeBj5SVYuBpcAFSRYDK4GbqmoRcFPbBjgdWNSWFcAV0Asd4GLgZOAk4OJdwdPavL/vuGUDjkeSNA0GDZF/3O3K4I3Ab/bQnqraVlU/aOu/pPdQfj5wFrC6NVsNnN3WzwKuqZ5bgCOSHAucBqyvqp1V9SiwHljW9r20qm6pqgKu6TuXJGkEBn0m8mHg60l+BgT4V8B/GPRFkiwE3gDcChxTVdvarp8Dx7T1+cBDfYdtabU91bdMUpckjchAIVJVtyd5NfB7rXR/Vf3TIMcmeTHwTeDDVfVE/2OLqqok9Tz7/LwlWUHvFhnHH3/8sF9OkuaM5/MBjG8CXgecCJyb5Ly9HZDkYHoB8pWq+lYrP9xuRdF+bm/1rfTeEb/LglbbU33BJPXnqKorq2q8qsbHxsb21m1J0oAGfbPhl4H/DryVXpi8CRjfyzEBrgLuq6rP9u1aA+yaYbUcuL6vfl6bpbUUeLzd9loHnJrkyPZA/VRgXdv3RJKl7bXO6zuXJGkEBn0mMg4sbg+wB/UW4E+Bu5NsbLWPAZcC1yU5H/gp8N62by1wBjAB/Bp4H0BV7UzyKeD21u6TVbWzrX8QuJre9OPvtEWSNCKDhsg99B6mb9tbw12q6vv0HsJP5h2TtC/gginOtQpYNUl9A/DaQfskSZpeg4bI0cC9SW4DntxVrKp3DaVXs8zClTc+u7750jNnsCeSNFqDhsjHh9kJSdLsNOgU379P8jvAoqr6bpIXAvOG2zVJ0v5u0NlZ7we+AXyxleYDfzukPkmSZolB3ydyAb3ZVk/As19Q9YphdUqSNDsMGiJPVtVTuzaSHAQM/Z3mkqT926Ah8vdJPgYc3r5b/evA/xpetyRJs8GgIbIS2AHcDXyA3hsD/UZDSZrjBp2d9c/Al9oiSRIwYIgk+QmTPAOpqldOe48kSbPG8/nsrF0OA/4IOGr6uyNJmk0GeiZSVY/0LVur6i8BP99Dkua4QW9nndi3+QJ6VyaDXsVIkg5QgwbBX/StPw1s5rcf4S5JmqMGnZ319mF3RJI0+wx6O+s/72n/bt9cKEmaI57P7Kw30fsKW4A/BG4DHhhGpyRJs8OgIbIAOLGqfgmQ5OPAjVX1J8PqmCRp/zfox54cAzzVt/1Uq0mS5rBBr0SuAW5L8u22fTaweig9kiTNGoPOzrokyXeAP2il91XVncPrliRpNhj0dhbAC4EnqupzwJYkJwypT5KkWWLQr8e9GPgocFErHQz8z70csyrJ9iT39NU+nmRrko1tOaNv30VJJpLcn+S0vvqyVptIsrKvfkKSW1v9a0kOGWzIkqTpMuiVyLuBdwH/CFBVPwNespdjrgaWTVK/rKqWtGUtQJLFwDnAa9oxX0gyL8k84PPA6cBi4NzWFuAz7VyvAh4Fzh9wLJKkaTLog/WnqqqSFECSF+3tgKr6hyQLBzz/WcC1VfUk8JMkE8BJbd9EVT3YXvda4Kwk9wGnAP+xtVkNfBy4YsDXG5qFK298dn3zpX5GpaQD26BXItcl+SJwRJL3A9+l+xdUXZjkrna768hWmw881NdmS6tNVX858FhVPb1bfVJJViTZkGTDjh07OnZbkrS7vYZIkgBfA74BfBP4PeDPq+qvOrzeFcDvAkuAbfzLD3Ycmqq6sqrGq2p8bGxsFC8pSXPCXm9ntdtYa6vq94H1+/JiVfXwrvUkXwJuaJtbgeP6mi5oNaaoP0LvquigdjXS316SNCKD3s76QZI37euLJTm2b/PdwK6ZW2uAc5Ic2qYOL6L32Vy3A4vaTKxD6D18X1NVBdwMvKcdvxy4fl/7J0l6fgZ9sH4y8CdJNtOboRV6Fymvm+qAJF8F3gYcnWQLcDHwtiRL6H1f+2bgA/ROtCnJdcC99L6v5IKqeqad50JgHTAPWFVVm9pLfBS4NsmngTuBqwYciyRpmuwxRJIcX1X/FzhtT+0mU1XnTlKe8j/0VXUJcMkk9bXA2knqD/LbGVySpBmwtyuRv6X36b0/TfLNqvr3I+iTJGmW2NszkfStv3KYHZEkzT57C5GaYl2SpL3eznp9kifoXZEc3tbhtw/WXzrU3kmS9mt7DJGqmjeqjkiSZp/n81HwkiT9C4aIJKkzQ0SS1JkhIknqzBCRJHVmiEiSOjNEJEmdGSKSpM4G/Sh4deD3rUs60HklIknqzBCRJHVmiEiSOjNEJEmdGSKSpM4MEUlSZ4aIJKmzoYVIklVJtie5p692VJL1SR5oP49s9SS5PMlEkruSnNh3zPLW/oEky/vqb0xydzvm8iRBkjRSw7wSuRpYtlttJXBTVS0CbmrbAKcDi9qyArgCeqEDXAycDJwEXLwreFqb9/cdt/trSZKGbGghUlX/AOzcrXwWsLqtrwbO7qtfUz23AEckORY4DVhfVTur6lFgPbCs7XtpVd1SVQVc03cuSdKIjPqZyDFVta2t/xw4pq3PBx7qa7el1fZU3zJJfVJJViTZkGTDjh079m0EkqRnzdiD9XYFUSN6rSuraryqxsfGxkbxkpI0J4w6RB5ut6JoP7e3+lbguL52C1ptT/UFk9QlSSM06hBZA+yaYbUcuL6vfl6bpbUUeLzd9loHnJrkyPZA/VRgXdv3RJKlbVbWeX3n2i8tXHnjs4skHSiG9lHwSb4KvA04OskWerOsLgWuS3I+8FPgva35WuAMYAL4NfA+gKrameRTwO2t3SeratfD+g/SmwF2OPCdtkiSRmhoIVJV506x6x2TtC3gginOswpYNUl9A/DafemjJGnf+I51SVJnhogkqTNDRJLUmSEiSerMEJEkdWaISJI6M0QkSZ0N7X0imlr/u9Y3X3rmDPZEkvaNVyKSpM4MEUlSZ4aIJKkzQ0SS1JkhIknqzNlZM8yZWpJmM69EJEmdGSKSpM4MEUlSZ4aIJKkzQ0SS1JkhIknqzCm++xGn+0qabWbkSiTJ5iR3J9mYZEOrHZVkfZIH2s8jWz1JLk8ykeSuJCf2nWd5a/9AkuUzMRZJmstm8nbW26tqSVWNt+2VwE1VtQi4qW0DnA4sassK4ArohQ5wMXAycBJw8a7gkSSNxv70TOQsYHVbXw2c3Ve/pnpuAY5IcixwGrC+qnZW1aPAemDZiPssSXPaTIVIAX+X5I4kK1rtmKra1tZ/DhzT1ucDD/Udu6XVpqpLkkZkph6sv7WqtiZ5BbA+yY/6d1ZVJanperEWVCsAjj/++Ok6rSTNeTNyJVJVW9vP7cC36T3TeLjdpqL93N6abwWO6zt8QatNVZ/s9a6sqvGqGh8bG5vOoUjSnDbyEEnyoiQv2bUOnArcA6wBds2wWg5c39bXAOe1WVpLgcfbba91wKlJjmwP1E9tNUnSiMzE7axjgG8n2fX6f1NV/zvJ7cB1Sc4Hfgq8t7VfC5wBTAC/Bt4HUFU7k3wKuL21+2RV7RzdMCRJIw+RqnoQeP0k9UeAd0xSL+CCKc61Clg13X2UJA3Gd6zPAr6TXdL+yhDZT/UHhyTtr/anNxtKkmYZr0RmGW9tSdqfeCUiSerMEJEkdebtrFnMW1uSZppXIpKkzgwRSVJn3s46QOz+vhJvb0kaBa9EJEmdeSVygPKhu6RR8EpEktSZVyJzgFclkobFKxFJUmdeicwxXpVImk5eiUiSOvNKZA7zqkTSvjJEBBgokroxRLRHhoukPTFE9BxTfTWvgSJpd4aIOpkqaAwXaW6Z9SGSZBnwOWAe8NdVdekMd2lOmypc+hk00oFjVodIknnA54F3AluA25Osqap7Z7Zn2pNBgmYQhpE082Z1iAAnARNV9SBAkmuBswBDZA6YrjAaNcNPB5LZHiLzgYf6trcAJ+/eKMkKYEXb/FWS+5/HaxwN/KJzD2e3uTr2oY47nxnWmfeZ/7znluc77t+ZrDjbQ2QgVXUlcGWXY5NsqKrxae7SrDBXx+645xbHvW9m+8eebAWO69te0GqSpBGY7SFyO7AoyQlJDgHOAdbMcJ8kac6Y1bezqurpJBcC6+hN8V1VVZum+WU63QY7QMzVsTvuucVx74NU1XScR5I0B83221mSpBlkiEiSOjNE9iDJsiT3J5lIsnKm+zOdkqxKsj3JPX21o5KsT/JA+3lkqyfJ5e33cFeSE2eu5/smyXFJbk5yb5JNST7U6gf02JMcluS2JD9s4/5Eq5+Q5NY2vq+1CSokObRtT7T9C2d0APsoybwkdya5oW0f8ONOsjnJ3Uk2JtnQatP+d26ITKHvI1VOBxYD5yZZPLO9mlZXA8t2q60EbqqqRcBNbRt6v4NFbVkBXDGiPg7D08BHqmoxsBS4oP1zPdDH/iRwSlW9HlgCLEuyFPgMcFlVvQp4FDi/tT8feLTVL2vtZrMPAff1bc+Vcb+9qpb0vR9k+v/Oq8plkgV4M7Cub/si4KKZ7tc0j3EhcE/f9v3AsW39WOD+tv5F4NzJ2s32Bbie3mevzZmxAy8EfkDv0x1+ARzU6s/+zdOb8fjmtn5Qa5eZ7nvH8S5o/8E8BbgByBwZ92bg6N1q0/537pXI1Cb7SJX5M9SXUTmmqra19Z8Dx7T1A/J30W5VvAG4lTkw9nZLZyOwHVgP/Bh4rKqebk36x/bsuNv+x4GXj7TD0+cvgf8K/HPbfjlzY9wF/F2SO9pHP8EQ/s5n9ftENDxVVUkO2PnfSV4MfBP4cFU9keTZfQfq2KvqGWBJkiOAbwOvntkeDV+Sfwtsr6o7krxthrszam+tqq1JXgGsT/Kj/p3T9XfulcjU5uJHqjyc5FiA9nN7qx9Qv4skB9MLkK9U1bdaeU6MHaCqHgNupncb54gku/5nsn9sz4677X8Z8Mhoezot3gK8K8lm4Fp6t7Q+x4E/bqpqa/u5nd7/NJzEEP7ODZGpzcWPVFkDLG/ry+k9L9hVP6/N4FgKPN53STyrpHfJcRVwX1V9tm/XAT32JGPtCoQkh9N7DnQfvTB5T2u2+7h3/T7eA3yv2s3y2aSqLqqqBVW1kN6/w9+rqj/mAB93khclecmudeBU4B6G8Xc+0w9/9ucFOAP4P/TuHf+3me7PNI/tq8A24J/o3f88n96935uAB4DvAke1tqE3U+3HwN3A+Ez3fx/G/VZ694rvAja25YwDfezA64A727jvAf681V8J3AZMAF8HDm31w9r2RNv/ypkewzT8Dt4G3DAXxt3G98O2bNr1369h/J37sSeSpM68nSVJ6swQkSR1ZohIkjozRCRJnRkikqTODBFJUmeGiCSps/8PVXF1a6cY05YAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 每个用户查看视频的vid的个数\n",
    "seq_vid_train['did'].value_counts().plot(kind='hist', bins=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "3e613216-b680-47ba-908b-91c08a3c62d2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:08:59.151690Z",
     "iopub.status.busy": "2022-07-13T08:08:59.151068Z",
     "iopub.status.idle": "2022-07-13T08:09:00.890760Z",
     "shell.execute_reply": "2022-07-13T08:09:00.890289Z",
     "shell.execute_reply.started": "2022-07-13T08:08:59.151639Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:ylabel='Frequency'>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAD4CAYAAAAtrdtxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAabklEQVR4nO3df5AX9Z3n8ecrIGoSDaCzHAW4QxIuOZJNkEyUqyRbWT1hwI2QW5PDyq6cR8luBati7d6tmFytbhKq9LYSNlypG4yc4CVB1sRlTvEIMZpU/uDHqMgvwzJBPJggzAqKrllYzPv+6M+Qdvx+hy8N/f3hvB5VXdP97k9/v+/uAd58uj/drYjAzMysiHc0OgEzM2tdLiJmZlaYi4iZmRXmImJmZoW5iJiZWWHDG51AvV188cXR3t7e6DTMzFrKU0899U8R0TYwPuSKSHt7O93d3Y1Ow8yspUh6oVLcp7PMzKwwFxEzMyvMRcTMzAorvYhIGibpGUmPpOWJkjZK6pH0oKQRKX5uWu5J69tzn3Friu+SNCMX70yxHkmLyt4XMzN7s3r0RL4EPJdbvhNYEhHvB44A81N8PnAkxZekdkiaDMwFPgR0AnenwjQMuAuYCUwGrkttzcysTkotIpLGA1cD30nLAq4AHkpNVgBz0vzstExaf2VqPxtYFRHHIuJ5oAe4LE09EbEnIo4Dq1JbMzOrk7J7In8L/CXwm7R8EfByRJxIy/uBcWl+HLAPIK1/JbU/GR+wTbX4W0haIKlbUndfX98Z7pKZmfUrrYhI+kPgUEQ8VdZ31CoilkVER0R0tLW95V4ZMzMrqMybDT8BXCNpFnAecCHwLWCkpOGptzEe6E3te4EJwH5Jw4H3AC/l4v3y21SLm5lZHZTWE4mIWyNifES0k10Y/0lEfAF4Arg2NZsHrEnzXWmZtP4nkb0xqwuYm0ZvTQQmAZuAzcCkNNprRPqOrrL2Z6D2RY+enMzMhqpGPPbkFmCVpK8DzwD3pfh9wAOSeoDDZEWBiNghaTWwEzgBLIyINwAk3QSsA4YByyNiR133xMxsiKtLEYmIJ4En0/wespFVA9v8C/C5KtsvBhZXiK8F1p7FVM3M7DT4jnUzMyvMRcTMzApzETEzs8JcRMzMrDAXETMzK8xFxMzMCnMRMTOzwlxEzMysMBcRMzMrzEXEzMwKcxExM7PCXETMzKwwFxEzMyvMRcTMzApzETEzs8JcRMzMrDAXETMzK6y0IiLpPEmbJD0raYekv07x+yU9L2lLmqakuCQtldQjaaukqbnPmidpd5rm5eIfk7QtbbNUksraHzMze6syX497DLgiIl6TdA7wc0mPpXX/LSIeGtB+JjApTZcD9wCXSxoN3AZ0AAE8JakrIo6kNjcCG8lek9sJPIaZmdVFaT2RyLyWFs9JUwyyyWxgZdpuAzBS0lhgBrA+Ig6nwrEe6EzrLoyIDRERwEpgTln7Y2Zmb1XqNRFJwyRtAQ6RFYKNadXidMpqiaRzU2wcsC+3+f4UGyy+v0K8Uh4LJHVL6u7r6zvT3TIzs6TUIhIRb0TEFGA8cJmkDwO3Ah8EPg6MBm4pM4eUx7KI6IiIjra2trK/zsxsyKjL6KyIeBl4AuiMiAPplNUx4H8Bl6VmvcCE3GbjU2yw+PgKcTMzq5MyR2e1SRqZ5s8HrgJ+ka5lkEZSzQG2p026gOvTKK1pwCsRcQBYB0yXNErSKGA6sC6tOyppWvqs64E1Ze2PmZm9VZmjs8YCKyQNIytWqyPiEUk/kdQGCNgC/FlqvxaYBfQArwM3AETEYUlfAzandl+NiMNp/ovA/cD5ZKOyPDLLzKyOSisiEbEVuLRC/Ioq7QNYWGXdcmB5hXg38OEzy9TMzIryHetmZlaYi4iZmRXmImJmZoW5iJiZWWEuImZmVpiLiJmZFeYiYmZmhbmImJlZYWXesT5ktC969OT83juubmAmZmb15Z6ImZkV5iJiZmaFuYiYmVlhLiJmZlaYi4iZmRXmImJmZoW5iJiZWWEuImZmVliZ71g/T9ImSc9K2iHpr1N8oqSNknokPShpRIqfm5Z70vr23GfdmuK7JM3IxTtTrEfSorL2xczMKiuzJ3IMuCIiPgpMATolTQPuBJZExPuBI8D81H4+cCTFl6R2SJoMzAU+BHQCd0salt7dfhcwE5gMXJfamplZnZRWRCLzWlo8J00BXAE8lOIrgDlpfnZaJq2/UpJSfFVEHIuI54Ee4LI09UTEnog4DqxKbc3MrE5KvSaSegxbgEPAeuCXwMsRcSI12Q+MS/PjgH0Aaf0rwEX5+IBtqsXNzKxOSi0iEfFGREwBxpP1HD5Y5vdVI2mBpG5J3X19fY1Iwczsbakuo7Mi4mXgCeDfAyMl9T89eDzQm+Z7gQkAaf17gJfy8QHbVItX+v5lEdERER1tbW1nY5fMzIxyR2e1SRqZ5s8HrgKeIysm16Zm84A1ab4rLZPW/yQiIsXnptFbE4FJwCZgMzApjfYaQXbxvaus/TEzs7cq830iY4EVaRTVO4DVEfGIpJ3AKklfB54B7kvt7wMekNQDHCYrCkTEDkmrgZ3ACWBhRLwBIOkmYB0wDFgeETtK3B8zMxugtCISEVuBSyvE95BdHxkY/xfgc1U+azGwuEJ8LbD2jJM1M7NCfMe6mZkV5iJiZmaFuYiYmVlhLiJmZlaYi4iZmRXmImJmZoW5iJiZWWEuImZmVpiLiJmZFeYiYmZmhbmImJlZYS4iZmZWmIuImZkV5iJiZmaFuYiYmVlhLiJmZlaYi4iZmRXmImJmZoWVVkQkTZD0hKSdknZI+lKK3y6pV9KWNM3KbXOrpB5JuyTNyMU7U6xH0qJcfKKkjSn+oKQRZe2PmZm9VZk9kRPAX0TEZGAasFDS5LRuSURMSdNagLRuLvAhoBO4W9IwScOAu4CZwGTgutzn3Jk+6/3AEWB+iftjZmYD1FREJP3e6X5wRByIiKfT/KvAc8C4QTaZDayKiGMR8TzQA1yWpp6I2BMRx4FVwGxJAq4AHkrbrwDmnG6eZmZWXK09kbslbZL0RUnvOd0vkdQOXApsTKGbJG2VtFzSqBQbB+zLbbY/xarFLwJejogTA+KVvn+BpG5J3X19faebvpmZVVFTEYmITwFfACYAT0n6nqSratlW0ruBHwA3R8RR4B7gfcAU4ADwjQJ5n5aIWBYRHRHR0dbWVvbXmZkNGcNrbRgRuyX9d6AbWApcmk4pfTkiflhpG0nnkBWQ7/a3iYiDufX3Ao+kxV6yItVvfIpRJf4SMFLS8NQbybc3M7M6qPWayEckLSG7rnEF8JmI+HdpfkmVbQTcBzwXEd/Mxcfmmn0W2J7mu4C5ks6VNBGYBGwCNgOT0kisEWQX37siIoAngGvT9vOANbXsj5mZnR219kT+J/Adsl7Hr/uDEfGr1Dup5BPAnwDbJG1JsS+Tja6aAgSwF/jT9Fk7JK0GdpKN7FoYEW8ASLoJWAcMA5ZHxI70ebcAqyR9HXiGrGiZmVmd1FpErgZ+nftH/R3AeRHxekQ8UGmDiPg5oAqr1lb7kohYDCyuEF9babuI2EM2esvMzBqg1tFZPwbOzy2/M8XMzGwIq7WInBcRr/UvpPl3lpOSmZm1ilqLyD9Lmtq/IOljwK8HaW9mZkNArddEbgb+XtKvyK5z/BvgP5WVVCtrX/Toyfm9d1zdwEzMzMpXUxGJiM2SPgh8IIV2RcS/lpeWmZm1gppvNgQ+DrSnbaZKIiJWlpKVmZm1hJqKiKQHyB5VsgV4I4UDcBExMxvCau2JdACT013iZmZmQO2js7aTXUw3MzM7qdaeyMXATkmbgGP9wYi4ppSszMysJdRaRG4vMwkzM2tNtQ7x/amk3wUmRcSPJb2T7GGIZmY2hNX6KPgbyV5D++0UGgf8Q0k5mZlZi6j1wvpCske7H4XsBVXA75SVlJmZtYZai8ixiDjevyBpONl9ImZmNoTVemH9p5K+DJyf3q3+ReD/lJdWc8o/F8vMzGrviSwC+oBtZG8iXAtUe6OhmZkNETUVkYj4TUTcGxGfi4hr0/ygp7MkTZD0hKSdknZI+lKKj5a0XtLu9HNUikvSUkk9krYOePT8vNR+t6R5ufjHJG1L2yxN73U3M7M6qXV01vOS9gycTrHZCeAvImIyMA1YKGkyWa/m8YiYBDyelgFmApPStAC4J333aOA24HKyV+He1l94Upsbc9t11rI/ZmZ2dpzOs7P6nQd8Dhg92AYRcQA4kOZflfQc2dDg2cCnU7MVwJPALSm+MvVwNkgaKWlsars+Ig4DSFoPdEp6ErgwIjak+EpgDvBYjftkZmZnqNbTWS/lpt6I+Fug5jcuSWoHLgU2AmNSgQF4ERiT5scB+3Kb7U+xweL7K8Qrff8CSd2Suvv6+mpN28zMTqHWR8FPzS2+g6xnUuu27wZ+ANwcEUfzly0iIiSVPlQ4IpYBywA6Ojo8NNnM7Cyp9XTWN3LzJ4C9wOdPtZGkc8gKyHcj4ocpfFDS2Ig4kE5XHUrxXmBCbvPxKdbLb09/9cefTPHxFdqbmVmd1Ho66w9y01URcWNE7BpsmzRS6j7guYj4Zm5VF9A/wmoesCYXvz6N0poGvJJOe60DpksalS6oTwfWpXVHJU1L33V97rPMzKwOaj0l9eeDrR9QJPp9AvgTYJukLSn2ZeAOYLWk+cAL/LZHsxaYBfQArwM3pM8+LOlrwObU7qv9F9nJbnq8Hzif7IK6L6qbmdXR6YzO+jhZbwHgM8AmYHe1DSLi50C1+zaurNA+yJ7RVemzlgPLK8S7gQ8PlriZmZWn1iIyHpgaEa8CSLodeDQi/risxMzMrPnV+tiTMcDx3PJxfjs018zMhqhaeyIrgU2SHk7Lc8huFDQzsyGs1jcbLpb0GPCpFLohIp4pLy0zM2sFtZ7OAngncDQivgXslzSxpJzMzKxF1PoAxtvInm91awqdA/zvspIyM7PWUGtP5LPANcA/A0TEr4ALykrKzMxaQ61F5Hi6jyMAJL2rvJTMzKxV1FpEVkv6NjBS0o3Aj4F7y0vLzMxawSlHZ6XnUj0IfBA4CnwA+KuIWF9ybmZm1uROWUTS49rXRsTvAS4cZmZ2Uq2ns56W9PFSMzEzs5ZT6x3rlwN/LGkv2QgtkXVSPlJWYmZm1vwGLSKSLomI/wfMqFM+ZmbWQk7VE/kHsqf3viDpBxHxR3XIyczMWsSpronk3wfy3jITMTOz1nOqIhJV5s3MzE5ZRD4q6aikV4GPpPmjkl6VdHSwDSUtl3RI0vZc7HZJvZK2pGlWbt2tknok7ZI0IxfvTLEeSYty8YmSNqb4g5JGnP7ul6t90aMnJzOzt6NBi0hEDIuICyPigogYnub7ly88xWffD3RWiC+JiClpWgsgaTIwF/hQ2uZuScMkDQPuAmYCk4HrUluAO9NnvR84AsyvbZfNzOxsOZ1HwZ+WiPgZcLjG5rOBVRFxLCKeB3qAy9LUExF7IuI4sAqYne6ivwJ4KG2/guxFWWZmVkelFZFB3CRpazrdNSrFxgH7cm32p1i1+EXAyxFxYkC8IkkLJHVL6u7r6ztb+2FmNuTVu4jcA7wPmAIcAL5Rjy+NiGUR0RERHW1tbfX4SjOzIaHWO9bPiog42D8v6V7gkbTYC0zINR2fYlSJv0T2ROHhqTeSb29mZnVS156IpLG5xc8C/SO3uoC5ks5Nr92dBGwCNgOT0kisEWQX37vSu02eAK5N288D1tRjH8zM7LdK64lI+j7waeBiSfuB24BPS5pCds/JXuBPASJih6TVwE7gBLAwIt5In3MTsA4YBiyPiB3pK24BVkn6OvAMcF9Z+2JmZpWVVkQi4roK4ar/0EfEYmBxhfhaYG2F+B6y0VtmZtYgjRidZWZmbxMuImZmVpiLiJmZFeYiYmZmhbmImJlZYS4iZmZWmIuImZkV5iJiZmaFuYiYmVlhLiJmZlaYi4iZmRXmImJmZoW5iJiZWWEuImZmVpiLiJmZFeYiYmZmhbmImJlZYaUVEUnLJR2StD0XGy1pvaTd6eeoFJekpZJ6JG2VNDW3zbzUfrekebn4xyRtS9sslaSy9uVsaF/06MnJzOztosyeyP1A54DYIuDxiJgEPJ6WAWYCk9K0ALgHsqJD9m72y8lehXtbf+FJbW7MbTfwu8zMrGSlFZGI+BlweEB4NrAiza8A5uTiKyOzARgpaSwwA1gfEYcj4giwHuhM6y6MiA0REcDK3GeZmVmd1PuayJiIOJDmXwTGpPlxwL5cu/0pNlh8f4V4RZIWSOqW1N3X13dme2BmZic17MJ66kFEnb5rWUR0RERHW1tbPb7SzGxIqHcROZhORZF+HkrxXmBCrt34FBssPr5C3MzM6qjeRaQL6B9hNQ9Yk4tfn0ZpTQNeSae91gHTJY1KF9SnA+vSuqOSpqVRWdfnPsvMzOpkeFkfLOn7wKeBiyXtJxtldQewWtJ84AXg86n5WmAW0AO8DtwAEBGHJX0N2JzafTUi+i/Wf5FsBNj5wGNpMjOzOiqtiETEdVVWXVmhbQALq3zOcmB5hXg38OEzydHMzM6M71g3M7PCXETMzKwwFxEzMyvMRcTMzApzETEzs8JcRMzMrDAXETMzK8xFxMzMCivtZkOrLv9iqr13XN3ATMzMzox7ImZmVpiLiJmZFeYiYmZmhbmImJlZYS4iZmZWmIuImZkV5iJiZmaFuYiYmVlhDSkikvZK2iZpi6TuFBstab2k3ennqBSXpKWSeiRtlTQ19znzUvvdkuZV+z4zMytHI3sifxARUyKiIy0vAh6PiEnA42kZYCYwKU0LgHsgKzpk722/HLgMuK2/8JiZWX000+ms2cCKNL8CmJOLr4zMBmCkpLHADGB9RByOiCPAeqCzzjmbmQ1pjXp2VgA/khTAtyNiGTAmIg6k9S8CY9L8OGBfbtv9KVYt3lL8HC0za2WNKiKfjIheSb8DrJf0i/zKiIhUYM4KSQvIToVxySWXnK2PNTMb8hpyOisietPPQ8DDZNc0DqbTVKSfh1LzXmBCbvPxKVYtXun7lkVER0R0tLW1nc1dMTMb0upeRCS9S9IF/fPAdGA70AX0j7CaB6xJ813A9WmU1jTglXTaax0wXdKodEF9eoqZmVmdNOJ01hjgYUn93/+9iPi/kjYDqyXNB14APp/arwVmAT3A68ANABFxWNLXgM2p3Vcj4nD9dsPMzOpeRCJiD/DRCvGXgCsrxANYWOWzlgPLz3aOZmZWm2Ya4mtmZi3GRcTMzApzETEzs8JcRMzMrLBG3Wxop+A72c2sFbgnYmZmhbmImJlZYS4iZmZWmIuImZkV5gvrLcAX2c2sWbknYmZmhbkn0kTyPQ4zs1bgItJifGrLzJqJT2eZmVlh7om0MPdKzKzRXETeJgZeT3FRMbN6cBF5m3IvxczqwddEzMyssJbviUjqBL4FDAO+ExF3NDilplOtV+LeipmdqZYuIpKGAXcBVwH7gc2SuiJiZ2Mza17V7kWp5R4VFxozG6iliwhwGdATEXsAJK0CZgMuIiWo582QLlhmraHVi8g4YF9ueT9w+cBGkhYAC9Lia5J2neb3XAz8U6EM66fZczyt/HRniZlU97Y6hg3S7Dk2e37QvDn+bqVgqxeRmkTEMmBZ0e0ldUdEx1lM6axr9hybPT9o/hybPT9o/hybPT9ojRzzWn10Vi8wIbc8PsXMzKwOWr2IbAYmSZooaQQwF+hqcE5mZkNGS5/OiogTkm4C1pEN8V0eETtK+KrCp8LqqNlzbPb8oPlzbPb8oPlzbPb8oDVyPEkR0egczMysRbX66SwzM2sgFxEzMyvMReQUJHVK2iWpR9KiJshngqQnJO2UtEPSl1L8dkm9krakaVaD89wraVvKpTvFRktaL2l3+jmqQbl9IHectkg6KunmRh9DScslHZK0PRereMyUWZr+XG6VNLVB+f2NpF+kHB6WNDLF2yX9Oncs/67s/AbJservVdKt6RjukjSjQfk9mMttr6QtKd6QY3jaIsJTlYnsYv0vgfcCI4BngckNzmksMDXNXwD8IzAZuB34r40+Zrk89wIXD4j9D2BRml8E3NkEeQ4DXiS7kaqhxxD4fWAqsP1UxwyYBTwGCJgGbGxQftOB4Wn+zlx+7fl2DT6GFX+v6e/Ns8C5wMT0d31YvfMbsP4bwF818hie7uSeyOBOPlYlIo4D/Y9VaZiIOBART6f5V4HnyO7cbwWzgRVpfgUwp3GpnHQl8MuIeKHRiUTEz4DDA8LVjtlsYGVkNgAjJY2td34R8aOIOJEWN5Ddq9UwVY5hNbOBVRFxLCKeB3rI/s6XZrD8JAn4PPD9MnM421xEBlfpsSpN8w+2pHbgUmBjCt2UTissb9SpopwAfiTpqfTYGYAxEXEgzb8IjGlMam8ylzf/pW2mYwjVj1kz/tn8L2S9o34TJT0j6aeSPtWopJJKv9dmO4afAg5GxO5crJmOYUUuIi1K0ruBHwA3R8RR4B7gfcAU4ABZt7iRPhkRU4GZwEJJv59fGVl/vaHjy9MNqtcAf59CzXYM36QZjlk1kr4CnAC+m0IHgEsi4lLgz4HvSbqwQek19e815zre/B+aZjqGVbmIDK4pH6si6RyyAvLdiPghQEQcjIg3IuI3wL2U3C0/lYjoTT8PAQ+nfA72n3JJPw81LkMgK3BPR8RBaL5jmFQ7Zk3zZ1PSfwb+EPhCKnSkU0QvpfmnyK43/NtG5DfI77WZjuFw4D8CD/bHmukYDsZFZHBN91iVdN70PuC5iPhmLp4/H/5ZYPvAbetF0rskXdA/T3bxdTvZsZuXms0D1jQmw5Pe9D+/ZjqGOdWOWRdwfRqlNQ14JXfaq26UvRTuL4FrIuL1XLxN2ft+kPReYBKwp975pe+v9nvtAuZKOlfSRLIcN9U7v+Q/AL+IiP39gWY6hoNq9JX9Zp/IRsH8I9n/Ar7SBPl8kuyUxlZgS5pmAQ8A21K8CxjbwBzfSzbq5VlgR/9xAy4CHgd2Az8GRjcwx3cBLwHvycUaegzJCtoB4F/Jzs/Pr3bMyEZl3ZX+XG4DOhqUXw/ZdYX+P4t/l9r+UfrdbwGeBj7TwGNY9fcKfCUdw13AzEbkl+L3A382oG1DjuHpTn7siZmZFebTWWZmVpiLiJmZFeYiYmZmhbmImJlZYS4iZmZWmIuImZkV5iJiZmaF/X9qu3EPj/wGywAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 每个用户查看视频的cid的个数\n",
    "seq_vid_train.groupby('did')['cid'].nunique().plot(kind='hist', bins=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "6a239297-4edb-4747-9057-de333ab38778",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:09:00.891870Z",
     "iopub.status.busy": "2022-07-13T08:09:00.891688Z",
     "iopub.status.idle": "2022-07-13T08:09:02.027933Z",
     "shell.execute_reply": "2022-07-13T08:09:02.027442Z",
     "shell.execute_reply.started": "2022-07-13T08:09:00.891853Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:ylabel='Frequency'>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAD4CAYAAAAtrdtxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAVwUlEQVR4nO3da6xd5X3n8e8vBhJyKxc7DLLJmCRWMk4bHOIAVTIaQhQw0MZ0hjKgNlgME1eKkRINo4mJRkNujMiLhpYqQaXFiskkMTQ3PMEd16Womb7g4gAFDEGcEjPYIdjFEJKmAwP9z4v9HLPrnGNvL87ex+ec70fa2mv912U/j9j4d5611l4rVYUkSV28arobIEmauQwRSVJnhogkqTNDRJLUmSEiSerssOluwKjNnz+/Fi9ePN3NkKQZY/78+WzevHlzVa3Yd9mcC5HFixezdevW6W6GJM0oSeZPVPdwliSpM0NEktSZISJJ6swQkSR1ZohIkjozRCRJnRkikqTODBFJUmeGiCSpszn3i/VRWrz21r3T268+dxpbIknD4UhEktSZISJJ6swQkSR1ZohIkjozRCRJnRkikqTODBFJUmeGiCSpM0NEktSZISJJ6swQkSR1NrQQSfKaJHcl+dsk25J8ptVPTHJnkrEkNyU5otVf3ebH2vLFffu6otUfSXJWX31Fq40lWTusvkiSJjbMkcjzwBlVdRKwDFiR5DTgC8A1VfU24Bng0rb+pcAzrX5NW48kS4ELgXcCK4AvJ5mXZB7wJeBsYClwUVtXkjQiQwuR6vl5mz28vQo4A/hmq68HzmvTK9s8bfkHk6TVN1TV81X1I2AMOKW9xqrqsap6AdjQ1pUkjchQz4m0EcN9wC5gC/B3wLNV9WJbZQewsE0vBJ4AaMt/ChzbX99nm8nqE7VjdZKtSbbu3r17CnomSYIhh0hVvVRVy4BF9EYO7xjm5+2nHddX1fKqWr5gwYLpaIIkzUojuTqrqp4Fbgd+HTgqyfjDsBYBO9v0TuAEgLb8V4Cn++v7bDNZXZI0IsO8OmtBkqPa9JHAh4CH6YXJ+W21VcAtbXpjm6ct/6uqqla/sF29dSKwBLgLuBtY0q72OoLeyfeNw+qPJOmXDfPxuMcD69tVVK8Cbq6q7yV5CNiQ5PPAvcANbf0bgK8mGQP20AsFqmpbkpuBh4AXgTVV9RJAksuAzcA8YF1VbRtifyRJ+xhaiFTV/cC7J6g/Ru/8yL71/wv89iT7ugq4aoL6JmDTK26sJKkTf7EuSerMEJEkdWaISJI6G+aJ9Tlj8dpb905vv/rcaWyJJI2WIxFJUmeGiCSpM0NEktSZISJJ6swQkSR15tVZ08CruSTNFo5EJEmdGSKSpM4MEUlSZ4aIJKkzQ0SS1JkhIknqzBCRJHVmiEiSOjNEJEmdGSKSpM4MEUlSZ4aIJKkzQ0SS1NnQQiTJCUluT/JQkm1JPt7qn06yM8l97XVO3zZXJBlL8kiSs/rqK1ptLMnavvqJSe5s9ZuSHDGs/kiSftkwRyIvApdX1VLgNGBNkqVt2TVVtay9NgG0ZRcC7wRWAF9OMi/JPOBLwNnAUuCivv18oe3rbcAzwKVD7I8kaR9DC5GqerKq7mnTPwMeBhbuZ5OVwIaqer6qfgSMAae011hVPVZVLwAbgJVJApwBfLNtvx44byidkSRNaCTnRJIsBt4N3NlKlyW5P8m6JEe32kLgib7NdrTaZPVjgWer6sV96hN9/uokW5Ns3b1791R0SZLECEIkyeuBbwGfqKrngOuAtwLLgCeB3x92G6rq+qpaXlXLFyxYMOyPk6Q5Y6iPx01yOL0A+VpVfRugqp7qW/4nwPfa7E7ghL7NF7Uak9SfBo5KclgbjfSvL0kagWFenRXgBuDhqvpiX/34vtV+C3iwTW8ELkzy6iQnAkuAu4C7gSXtSqwj6J1831hVBdwOnN+2XwXcMqz+SJJ+2TBHIu8DPgI8kOS+VvsUvaurlgEFbAd+D6CqtiW5GXiI3pVda6rqJYAklwGbgXnAuqra1vb3SWBDks8D99ILLUnSiAwtRKrqb4BMsGjTfra5CrhqgvqmibarqsfoXb0lSZoG/mJdktSZISJJ6swQkSR1ZohIkjozRCRJnRkikqTODBFJUmeGiCSpM0NEktSZISJJ6swQkSR1ZohIkjozRCRJnRkikqTODBFJUmeGiCSps6E+Y322Wbz21r3T268+dxpbIkmHBkcikqTODBFJUmeGiCSpM0NEktSZISJJ6mygEEnya8NuiCRp5hl0JPLlJHcl+ViSXxlkgyQnJLk9yUNJtiX5eKsfk2RLkkfb+9GtniTXJhlLcn+Sk/v2taqt/2iSVX319yR5oG1zbZIcRN8lSa/QQCFSVf8a+B3gBOAHSb6e5EMH2OxF4PKqWgqcBqxJshRYC9xWVUuA29o8wNnAkvZaDVwHvdABrgROBU4BrhwPnrbOR/u2WzFIfw5Vi9feuvclSTPBwOdEqupR4L8CnwT+DXBtkh8m+beTrP9kVd3Tpn8GPAwsBFYC69tq64Hz2vRK4MbquQM4KsnxwFnAlqraU1XPAFuAFW3ZG6vqjqoq4Ma+fUmSRmDQcyLvSnINvSA4A/jNqvpXbfqaAbZfDLwbuBM4rqqebIt+AhzXphcCT/RttqPV9lffMUF9os9fnWRrkq27d+8+UHMlSQMadCTyR8A9wElVtaZvhPFjeqOTSSV5PfAt4BNV9Vz/sjaCqINu9UGqquuranlVLV+wYMGwP06S5oxBQ+Rc4OtV9Y8ASV6V5LUAVfXVyTZKcji9APlaVX27lZ9qh6Jo77tafSe9cy7jFrXa/uqLJqhLkkZk0BD5S+DIvvnXttqk2pVSNwAPV9UX+xZtBMavsFoF3NJXv7hdpXUa8NN22GszcGaSo9sJ9TOBzW3Zc0lOa591cd++JEkjMOhdfF9TVT8fn6mqn4+PRPbjfcBHgAeS3NdqnwKuBm5OcinwOHBBW7YJOAcYA34BXNI+a0+SzwF3t/U+W1V72vTHgK/QC7g/by9J0ogMGiL/kOTk8XMhSd4D/OP+NqiqvwEm+93GBydYv4A1k+xrHbBugvpW4Ff333RJ0rAMGiKfAP4syY/pBcO/AP79sBolSZoZBgqRqro7yTuAt7fSI1X1/4bXLEnSTHAwTzZ8L7C4bXNyEqrqxqG0SpI0IwwUIkm+CrwVuA94qZXHfyUuSZqjBh2JLAeWtpPfkiQBg/9O5EF6J9MlSdpr0JHIfOChJHcBz48Xq+rDQ2mVJGlGGDREPj3MRkiSZqZBL/H96yT/ElhSVX/Zfq0+b7hNkyQd6ga9FfxHgW8Cf9xKC4HvDqlNkqQZYtAT62vo3QvrOdj7gKo3DatRkqSZYdAQeb6qXhifSXIYI3gOiCTp0DZoiPx1kk8BR7Znq/8Z8D+H1yxJ0kwwaIisBXYDDwC/R++27ft9oqEkafYb9OqsfwL+pL0kSQIGv3fWj5jgHEhVvWXKWyRJmjEO5t5Z414D/DZwzNQ3R5I0kwx0TqSqnu577ayqPwDOHW7TJEmHukEPZ53cN/sqeiOTg3kWiSRpFho0CH6/b/pFYDtwwZS3RpI0owx6ddYHht0QSdLMM+jhrP+0v+VV9cWpaY4kaSY5mKuz3gtsbPO/CdwFPDqMRkmSZoZBf7G+CDi5qi6vqsuB9wBvrqrPVNVnJtogyboku5I82Ff7dJKdSe5rr3P6ll2RZCzJI0nO6quvaLWxJGv76icmubPVb0pyxMF2XpL0ygwaIscBL/TNv9Bq+/MVYMUE9Wuqall7bQJIshS4EHhn2+bLSeYlmQd8CTgbWApc1NYF+ELb19uAZ4BLB+yLJGmKDBoiNwJ3tZHEp4E7gfX726Cqvg/sGXD/K4ENVfV8Vf0IGANOaa+xqnqs3UV4A7AySYAz6D3jhNaW8wb8LEnSFBn0x4ZXAZfQ+4v/GeCSqvrvHT/zsiT3t8NdR7faQuCJvnV2tNpk9WOBZ6vqxX3qE0qyOsnWJFt3797dsdmSpH0NOhIBeC3wXFX9IbAjyYkdPu864K3AMuBJ/vnvT4amqq6vquVVtXzBggWj+EhJmhMGfTzulcAngSta6XDgfxzsh1XVU1X1Ut9dgU9pi3YCJ/StuqjVJqs/DRzVHo7VX5ckjdCgI5HfAj4M/ANAVf0YeMPBfliS4/fZ5/iVWxuBC5O8uo1wltC7hPhuYEm7EusIeiffN1ZVAbcD57ftVwG3HGx7JEmvzKC/E3mhqipJASR53YE2SPIN4HRgfpIdwJXA6UmW0but/HZ6D7iiqrYluRl4iN5tVdZU1UttP5cBm4F5wLqq2tY+4pPAhiSfB+4FbhiwL5KkKTJoiNyc5I/pHUL6KPAfOMADqqrqognKk/5D307eXzVBfRO9JynuW3+Mlw+HSZKmwQFDpF1OexPwDuA54O3Af6uqLUNum5rFa2/dO739au/AL+nQccAQaYexNlXVrwEGhyRpr0FPrN+T5L1DbYkkacYZ9JzIqcDvJtlO7wqt0BukvGtYDZMkHfr2GyJJ3lxV/wc4a3/rSZLmpgONRL5L7+69jyf5VlX9uxG0SZI0QxzonEj6pt8yzIZIkmaeA4VITTItSdIBD2edlOQ5eiOSI9s0vHxi/Y1DbZ0k6ZC23xCpqnmjaogkaeY5mFvBS5L0zxgikqTODBFJUmeGiCSpM0NEktSZISJJ6swQkSR1ZohIkjozRCRJnRkikqTODBFJUmeGiCSpM0NEktTZ0EIkyboku5I82Fc7JsmWJI+296NbPUmuTTKW5P4kJ/dts6qt/2iSVX319yR5oG1zbZIgSRqpYY5EvgKs2Ke2FritqpYAt7V5gLOBJe21GrgOeqEDXAmcCpwCXDkePG2dj/Ztt+9nSZKGbGghUlXfB/bsU14JrG/T64Hz+uo3Vs8dwFFJjgfOArZU1Z6qegbYAqxoy95YVXdUVQE39u1LkjQioz4nclxVPdmmfwIc16YXAk/0rbej1fZX3zFBfUJJVifZmmTr7t27X1kPJEl7TduJ9TaCGMlz26vq+qpaXlXLFyxYMIqPlKQ5YdQh8lQ7FEV739XqO4ET+tZb1Gr7qy+aoC5JGqFRh8hGYPwKq1XALX31i9tVWqcBP22HvTYDZyY5up1QPxPY3JY9l+S0dlXWxX37kiSNyGHD2nGSbwCnA/OT7KB3ldXVwM1JLgUeBy5oq28CzgHGgF8AlwBU1Z4knwPubut9tqrGT9Z/jN4VYEcCf95ekqQRGlqIVNVFkyz64ATrFrBmkv2sA9ZNUN8K/OoraeNMt3jtrXunt1997jS2RNJc5S/WJUmdGSKSpM4MEUlSZ4aIJKkzQ0SS1JkhIknqzBCRJHVmiEiSOjNEJEmdGSKSpM4MEUlSZ4aIJKkzQ0SS1JkhIknqzBCRJHVmiEiSOjNEJEmdGSKSpM4MEUlSZ4aIJKkzQ0SS1Nlh090ADcfitbfund5+9bnT2BJJs5kjEUlSZ9MSIkm2J3kgyX1JtrbaMUm2JHm0vR/d6klybZKxJPcnOblvP6va+o8mWTUdfZGkuWw6RyIfqKplVbW8za8FbquqJcBtbR7gbGBJe60GroNe6ABXAqcCpwBXjgePJGk0DqXDWSuB9W16PXBeX/3G6rkDOCrJ8cBZwJaq2lNVzwBbgBUjbrMkzWnTFSIF/EWSHyRZ3WrHVdWTbfonwHFteiHwRN+2O1ptsvovSbI6ydYkW3fv3j1VfZCkOW+6rs56f1XtTPImYEuSH/YvrKpKUlP1YVV1PXA9wPLly6dsv5I0103LSKSqdrb3XcB36J3TeKodpqK972qr7wRO6Nt8UatNVpckjcjIQyTJ65K8YXwaOBN4ENgIjF9htQq4pU1vBC5uV2mdBvy0HfbaDJyZ5Oh2Qv3MVpMkjch0HM46DvhOkvHP/3pV/a8kdwM3J7kUeBy4oK2/CTgHGAN+AVwCUFV7knwOuLut99mq2jO6bkiSRh4iVfUYcNIE9aeBD05QL2DNJPtaB6yb6jZKkgZzKF3iK0maYQwRSVJn3oBxjvHGjJKmkiMRSVJnhogkqTNDRJLUmSEiSerMEJEkdWaISJI6M0QkSZ0ZIpKkzvyxoQB/hCipG0cikqTODBFJUmeGiCSpM0NEktSZJ9a1X55wl7Q/jkQkSZ0ZIpKkzjycpU48zCUJHIlIkl4BRyKaUo5QpLnFkYgkqbMZPxJJsgL4Q2Ae8KdVdfU0N0kTmGyE4shFmtlmdIgkmQd8CfgQsAO4O8nGqnpoelumV8rQkWaGGR0iwCnAWFU9BpBkA7ASMETmmEFCZ3/LXkldmstSVdPdhs6SnA+sqKr/2OY/ApxaVZfts95qYHWbfTvwyAF2PR/4+ylu7qHOPs9+c62/YJ+nyt8DVNWKfRfM9JHIQKrqeuD6QddPsrWqlg+xSYcc+zz7zbX+gn0ehZl+ddZO4IS++UWtJkkagZkeIncDS5KcmOQI4EJg4zS3SZLmjBl9OKuqXkxyGbCZ3iW+66pq2xTseuBDX7OIfZ795lp/wT4P3Yw+sS5Jml4z/XCWJGkaGSKSpM4MkT5JViR5JMlYkrXT3Z5hSbIuya4kD/bVjkmyJcmj7f3o6WzjVEpyQpLbkzyUZFuSj7f6bO7za5LcleRvW58/0+onJrmzfcdvahekzCpJ5iW5N8n32vys7nOS7UkeSHJfkq2tNrLvtiHS9N1C5WxgKXBRkqXT26qh+Qqw74+G1gK3VdUS4LY2P1u8CFxeVUuB04A17b/tbO7z88AZVXUSsAxYkeQ04AvANVX1NuAZ4NLpa+LQfBx4uG9+LvT5A1W1rO/3ISP7bhsiL9t7C5WqegEYv4XKrFNV3wf27FNeCaxv0+uB80bZpmGqqier6p42/TN6/8AsZHb3uarq52328PYq4Azgm60+q/oMkGQRcC7wp20+zPI+T2Jk321D5GULgSf65ne02lxxXFU92aZ/Ahw3nY0ZliSLgXcDdzLL+9wO69wH7AK2AH8HPFtVL7ZVZuN3/A+A/wL8U5s/ltnf5wL+IskP2i2eYITf7Rn9OxENR1VVkll37XeS1wPfAj5RVc/1/kjtmY19rqqXgGVJjgK+A7xjels0XEl+A9hVVT9Icvo0N2eU3l9VO5O8CdiS5If9C4f93XYk8rK5fguVp5IcD9Ded01ze6ZUksPpBcjXqurbrTyr+zyuqp4Fbgd+HTgqyfgfj7PtO/4+4MNJttM7HH0GvWcNzeY+U1U72/suen8snMIIv9uGyMvm+i1UNgKr2vQq4JZpbMuUasfFbwAerqov9i2azX1e0EYgJDmS3jN3HqYXJue31WZVn6vqiqpaVFWL6f3/+1dV9TvM4j4neV2SN4xPA2cCDzLC77a/WO+T5Bx6x1THb6Fy1fS2aDiSfAM4nd4to58CrgS+C9wMvBl4HLigqvY9+T4jJXk/8L+BB3j5WPmn6J0Xma19fhe9E6rz6P2xeHNVfTbJW+j9lX4McC/wu1X1/PS1dDja4az/XFW/MZv73Pr2nTZ7GPD1qroqybGM6LttiEiSOvNwliSpM0NEktSZISJJ6swQkSR1ZohIkjozRCRJnRkikqTO/j8z5uM+IeNRagAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 每个用户查看视频的series_id的个数\n",
    "seq_vid_train.groupby('did')['series_id'].nunique().plot(\n",
    "    kind='hist', bins=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "790bc53f-f116-4c4c-af3b-3cabe5059855",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:09:02.029165Z",
     "iopub.status.busy": "2022-07-13T08:09:02.028985Z",
     "iopub.status.idle": "2022-07-13T08:09:03.385158Z",
     "shell.execute_reply": "2022-07-13T08:09:03.384677Z",
     "shell.execute_reply.started": "2022-07-13T08:09:02.029148Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dd4ad1276e13603b6a15704295ba6ec8    0.476768\n",
       "0cc56b6914c87d906b23e6d90cd4f65e    0.359648\n",
       "815a870db0d94166e330ae97f7068de2    0.225529\n",
       "db978f94b983a6ed3021e1e927ee09ef    0.199913\n",
       "7d697b54355da49d2a6949e9969bf828    0.196432\n",
       "                                      ...   \n",
       "2e0b8a702cb0edbf0ceb7a8f623401b0    0.000006\n",
       "78e73c5dabb7df3875e0ec103e2336c9    0.000006\n",
       "99300ebe9c68d78e17fec043408c1d85    0.000006\n",
       "aa7368c717080d5f1a7f3f6c3bac08fe    0.000006\n",
       "a84a780911a583e8e62fce0f90d4e1c1    0.000006\n",
       "Name: vid, Length: 147074, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 热门vid拥有更多的观看率\n",
    "seq_vid_train['vid'].value_counts() / seq_vid_train['did'].nunique()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71e248e9-27c1-4335-8397-3985d8a8b606",
   "metadata": {
    "tags": []
   },
   "source": [
    "# 用户行为划分\n",
    "\n",
    "- 用户历史都观看同一个cid、series_id的视频，且视频序列中 serialno 不断增加，用户占比4%\n",
    "\n",
    "- 用户最近观看cid、series_id的视频\n",
    "    - 存在跳serialno观看的行为\n",
    "    - 中间有查看其他cid、series_id的视频\n",
    "\n",
    "\n",
    "- 用户最近观看cid、series_id的视频，看完了所有的serialno\n",
    "    - 看完了再次重头看\n",
    "    - 看新的视频\n",
    "    \n",
    "# 召回策略\n",
    "\n",
    "1. 使用vid关联规则【统计】\n",
    "    - 计算与最后vid关联性最大的下一个vid；\n",
    "    - 计算与最后5个vid关联性最大的下一个vid；\n",
    "    - 计算与历史观看时间最长的vid关联性最大的下一个vid；\n",
    "2. 使用vid计算词向量相似度\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee8e513b-d894-4fe8-9163-c40d7e9ec76e",
   "metadata": {},
   "source": [
    "# 本地验证集划分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "67932458-5e1b-4b48-a245-5e4755581835",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:08:45.168745Z",
     "iopub.status.busy": "2022-07-13T08:08:45.168095Z",
     "iopub.status.idle": "2022-07-13T08:08:47.576721Z",
     "shell.execute_reply": "2022-07-13T08:08:47.576226Z",
     "shell.execute_reply.started": "2022-07-13T08:08:45.168691Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "# seq_vid_train.drop(['cpn', 'fpn', 'is_intact', 'title_length', 'duration',\n",
    "#                     'upgc_flag', 'stars', 'tags', 'key_word'], axis=1, inplace=True)\n",
    "\n",
    "# vid 本身特征\n",
    "# seq_train_info['vid_freq'] = seq_train_info['vid'].map(\n",
    "#     seq_train_info['vid'].value_counts())\n",
    "\n",
    "# seq_train_info['cid_max'] = seq_train_info['cid'].map(\n",
    "#     vid_info.groupby('cid')['serialno'].max())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "fab1427c-ff6b-454e-80d8-5ab13eec2680",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-15T11:11:31.587546Z",
     "iopub.status.busy": "2022-07-15T11:11:31.587372Z",
     "iopub.status.idle": "2022-07-15T11:11:36.367266Z",
     "shell.execute_reply": "2022-07-15T11:11:36.366640Z",
     "shell.execute_reply.started": "2022-07-15T11:11:31.587530Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "did_max_seq_no = seq_vid_train.groupby('did')['seq_no'].max().reset_index()\n",
    "seq_local_valid = pd.merge(seq_vid_train, did_max_seq_no, on=['did', 'seq_no'])\n",
    "seq_local_train = seq_vid_train[~seq_vid_train['index'].isin(\n",
    "    seq_local_valid['index'])]\n",
    "\n",
    "seq_local_valid.set_index('did', inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "018021c5-c75c-4f79-bba2-a3a44301bffc",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:09:23.690561Z",
     "iopub.status.busy": "2022-07-13T08:09:23.690320Z",
     "iopub.status.idle": "2022-07-13T08:09:23.754348Z",
     "shell.execute_reply": "2022-07-13T08:09:23.753332Z",
     "shell.execute_reply.started": "2022-07-13T08:09:23.690544Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "index                                                      537809\n",
       "vid                              50999680985c2e4b8e1ca38a23496571\n",
       "vts                                                      0.749654\n",
       "hb                                                       0.903926\n",
       "seq_no                                                         18\n",
       "cpn                                                             1\n",
       "fpn                                                            68\n",
       "time_gap                                                     1751\n",
       "cid                              644de3dce40c823149df609c0dde5b6d\n",
       "is_intact                                                       1\n",
       "serialno                                                       11\n",
       "classify_id                                                     1\n",
       "series_id                                                       0\n",
       "duration                                                     6495\n",
       "title_length                                                   97\n",
       "upgc_flag                                                       0\n",
       "stars           [101006331, 101027454, 101000262, 101002467, 1...\n",
       "tags                                                           []\n",
       "key_word        [501744, 247760, 226997, 574583, 613625, 49686...\n",
       "Name: 0002bcdd4bbf19b44c9badc817b160ed, dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "seq_local_valid.loc['0002bcdd4bbf19b44c9badc817b160ed']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "749308e4-3110-4f34-9ceb-9fd7ab7e1dc5",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:09:23.755376Z",
     "iopub.status.busy": "2022-07-13T08:09:23.755190Z",
     "iopub.status.idle": "2022-07-13T08:09:24.526515Z",
     "shell.execute_reply": "2022-07-13T08:09:24.525406Z",
     "shell.execute_reply.started": "2022-07-13T08:09:23.755360Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((170909, 19), (5425504, 20), (5596413, 20))"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "seq_local_valid.shape, seq_local_train.shape, seq_vid_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "c5fc2871-6a1e-4d2d-8e57-99cf66a975f2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:09:26.365297Z",
     "iopub.status.busy": "2022-07-13T08:09:26.364715Z",
     "iopub.status.idle": "2022-07-13T08:09:26.406342Z",
     "shell.execute_reply": "2022-07-13T08:09:26.405770Z",
     "shell.execute_reply.started": "2022-07-13T08:09:26.365248Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "hot20 = seq_local_valid['vid'].value_counts().head(50).index\n",
    "hot20 = list(hot20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "18038ac1-41b5-4a47-936b-bb820dfff152",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:09:28.150754Z",
     "iopub.status.busy": "2022-07-13T08:09:28.150162Z",
     "iopub.status.idle": "2022-07-13T08:09:28.167186Z",
     "shell.execute_reply": "2022-07-13T08:09:28.166638Z",
     "shell.execute_reply.started": "2022-07-13T08:09:28.150703Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>did</th>\n",
       "      <th>vid</th>\n",
       "      <th>vts</th>\n",
       "      <th>hb</th>\n",
       "      <th>seq_no</th>\n",
       "      <th>cpn</th>\n",
       "      <th>fpn</th>\n",
       "      <th>time_gap</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2696560</td>\n",
       "      <td>0000d0aabe8c188f88c756ce0f7f9639</td>\n",
       "      <td>f87cf2ad695b4bb5dd830ae40bf29475</td>\n",
       "      <td>44.0</td>\n",
       "      <td>3286.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "      <td>130</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2696559</td>\n",
       "      <td>0000d0aabe8c188f88c756ce0f7f9639</td>\n",
       "      <td>fde2b2a62fb6061e4a958fb0b78c0293</td>\n",
       "      <td>1260.0</td>\n",
       "      <td>3698.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1</td>\n",
       "      <td>130</td>\n",
       "      <td>108.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2696558</td>\n",
       "      <td>0000d0aabe8c188f88c756ce0f7f9639</td>\n",
       "      <td>2c47a9311f2f19d6b33670715cdc544d</td>\n",
       "      <td>780.0</td>\n",
       "      <td>7715.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1</td>\n",
       "      <td>130</td>\n",
       "      <td>57645.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2696557</td>\n",
       "      <td>0000d0aabe8c188f88c756ce0f7f9639</td>\n",
       "      <td>bee6becd984e6486215f7acd876b5d26</td>\n",
       "      <td>2714.0</td>\n",
       "      <td>2611.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1</td>\n",
       "      <td>68</td>\n",
       "      <td>546847.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2696556</td>\n",
       "      <td>0000d0aabe8c188f88c756ce0f7f9639</td>\n",
       "      <td>449a1829a8742e652cb39d6ae7523df1</td>\n",
       "      <td>1620.0</td>\n",
       "      <td>1983.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1</td>\n",
       "      <td>130</td>\n",
       "      <td>3103.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5596408</th>\n",
       "      <td>2845463</td>\n",
       "      <td>ffff856232794b1cff1baea25bc25786</td>\n",
       "      <td>f3ff5b300b1d9678a7d9854e690e25b9</td>\n",
       "      <td>695.0</td>\n",
       "      <td>2531.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>1</td>\n",
       "      <td>130</td>\n",
       "      <td>863.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5596409</th>\n",
       "      <td>2845462</td>\n",
       "      <td>ffff856232794b1cff1baea25bc25786</td>\n",
       "      <td>3d3a565c086dc00a8fb9eb3ad8876ce9</td>\n",
       "      <td>741.0</td>\n",
       "      <td>2228.0</td>\n",
       "      <td>66.0</td>\n",
       "      <td>1</td>\n",
       "      <td>130</td>\n",
       "      <td>680.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5596410</th>\n",
       "      <td>2845461</td>\n",
       "      <td>ffff856232794b1cff1baea25bc25786</td>\n",
       "      <td>f6abae9373d6adfc32f5ea6d37e5fb17</td>\n",
       "      <td>900.0</td>\n",
       "      <td>2236.0</td>\n",
       "      <td>67.0</td>\n",
       "      <td>1</td>\n",
       "      <td>26</td>\n",
       "      <td>3278.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5596411</th>\n",
       "      <td>2845460</td>\n",
       "      <td>ffff856232794b1cff1baea25bc25786</td>\n",
       "      <td>aa09d0cf8a2c89d82d2909027897e657</td>\n",
       "      <td>360.0</td>\n",
       "      <td>2263.0</td>\n",
       "      <td>68.0</td>\n",
       "      <td>1</td>\n",
       "      <td>26</td>\n",
       "      <td>1085.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5596412</th>\n",
       "      <td>2845459</td>\n",
       "      <td>ffff856232794b1cff1baea25bc25786</td>\n",
       "      <td>be13d4527c01c4a6edc7243444108ed0</td>\n",
       "      <td>975.0</td>\n",
       "      <td>1310.0</td>\n",
       "      <td>69.0</td>\n",
       "      <td>1</td>\n",
       "      <td>26</td>\n",
       "      <td>8389.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5596413 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           index                               did  \\\n",
       "0        2696560  0000d0aabe8c188f88c756ce0f7f9639   \n",
       "1        2696559  0000d0aabe8c188f88c756ce0f7f9639   \n",
       "2        2696558  0000d0aabe8c188f88c756ce0f7f9639   \n",
       "3        2696557  0000d0aabe8c188f88c756ce0f7f9639   \n",
       "4        2696556  0000d0aabe8c188f88c756ce0f7f9639   \n",
       "...          ...                               ...   \n",
       "5596408  2845463  ffff856232794b1cff1baea25bc25786   \n",
       "5596409  2845462  ffff856232794b1cff1baea25bc25786   \n",
       "5596410  2845461  ffff856232794b1cff1baea25bc25786   \n",
       "5596411  2845460  ffff856232794b1cff1baea25bc25786   \n",
       "5596412  2845459  ffff856232794b1cff1baea25bc25786   \n",
       "\n",
       "                                      vid     vts      hb  seq_no  cpn  fpn  \\\n",
       "0        f87cf2ad695b4bb5dd830ae40bf29475    44.0  3286.0     1.0    1  130   \n",
       "1        fde2b2a62fb6061e4a958fb0b78c0293  1260.0  3698.0     2.0    1  130   \n",
       "2        2c47a9311f2f19d6b33670715cdc544d   780.0  7715.0     3.0    1  130   \n",
       "3        bee6becd984e6486215f7acd876b5d26  2714.0  2611.0     4.0    1   68   \n",
       "4        449a1829a8742e652cb39d6ae7523df1  1620.0  1983.0     5.0    1  130   \n",
       "...                                   ...     ...     ...     ...  ...  ...   \n",
       "5596408  f3ff5b300b1d9678a7d9854e690e25b9   695.0  2531.0    65.0    1  130   \n",
       "5596409  3d3a565c086dc00a8fb9eb3ad8876ce9   741.0  2228.0    66.0    1  130   \n",
       "5596410  f6abae9373d6adfc32f5ea6d37e5fb17   900.0  2236.0    67.0    1   26   \n",
       "5596411  aa09d0cf8a2c89d82d2909027897e657   360.0  2263.0    68.0    1   26   \n",
       "5596412  be13d4527c01c4a6edc7243444108ed0   975.0  1310.0    69.0    1   26   \n",
       "\n",
       "         time_gap  \n",
       "0             NaN  \n",
       "1           108.0  \n",
       "2         57645.0  \n",
       "3        546847.0  \n",
       "4          3103.0  \n",
       "...           ...  \n",
       "5596408     863.0  \n",
       "5596409     680.0  \n",
       "5596410    3278.0  \n",
       "5596411    1085.0  \n",
       "5596412    8389.0  \n",
       "\n",
       "[5596413 rows x 9 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "seq_train"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d2e01a9-2e66-4167-86c4-930f64f6b53f",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 相似度计算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "2768d496-11e6-4eb8-8113-763ea38be143",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:32:15.149551Z",
     "iopub.status.busy": "2022-07-13T08:32:15.149117Z",
     "iopub.status.idle": "2022-07-13T08:32:35.934888Z",
     "shell.execute_reply": "2022-07-13T08:32:35.934425Z",
     "shell.execute_reply.started": "2022-07-13T08:32:15.149514Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from gensim.models import Word2Vec\n",
    "vid_list = seq_train.groupby(['did'])['vid'].apply(list).values\n",
    "vid_w2v = Word2Vec(sentences=vid_list[:], window=5, min_count=1, workers=8)\n",
    "\n",
    "# time gap 可以作为序列拆分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "86d073c1-235e-4f64-859e-7c576b072cb9",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-15T11:11:57.304036Z",
     "iopub.status.busy": "2022-07-15T11:11:57.303461Z",
     "iopub.status.idle": "2022-07-15T11:12:06.701854Z",
     "shell.execute_reply": "2022-07-15T11:12:06.700941Z",
     "shell.execute_reply.started": "2022-07-15T11:11:57.303988Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/lyz/.local/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  \n",
      "/home/lyz/.local/lib/python3.6/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  This is separate from the ipykernel package so we can avoid doing imports until\n"
     ]
    }
   ],
   "source": [
    "tmp_seq_train = seq_train[['did', 'vid']]\n",
    "tmp_seq_train['vid_next1'] = tmp_seq_train.groupby(['did']).vid.shift(-1)\n",
    "tmp_seq_train['vid_next2'] = tmp_seq_train.groupby(['did']).vid.shift(-2)\n",
    "\n",
    "vid_next1 = tmp_seq_train.groupby(['vid', 'vid_next1']).size(\n",
    ").sort_values(ascending=False).reset_index()\n",
    "vid_next1 = vid_next1[vid_next1[0] > 100]\n",
    "vid_next1.set_index('vid', inplace=True)\n",
    "vid_next1 = vid_next1.sort_values(by=['vid', 0], ascending=False)\n",
    "vid_next1 = vid_next1.groupby('vid')['vid_next1'].apply(list).to_dict()\n",
    "\n",
    "vid_next2 = tmp_seq_train.groupby(['vid', 'vid_next1', 'vid_next2']).size(\n",
    ").sort_values(ascending=False).reset_index()\n",
    "vid_next2 = vid_next2[vid_next2[0] > 100]\n",
    "vid_next2['vid'] = vid_next2['vid'] + vid_next2['vid_next1']\n",
    "vid_next2 = vid_next2[['vid', 'vid_next2', 0]]\n",
    "vid_next2 = vid_next2.sort_values(by=['vid', 0], ascending=False)\n",
    "vid_next2 = vid_next2.groupby('vid')['vid_next2'].apply(list).to_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6601060-6533-48bc-bee4-5a8e7a96788c",
   "metadata": {},
   "source": [
    "# 模型构建"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "865d735f-e5a0-4176-8c8c-6362c4c59002",
   "metadata": {},
   "source": [
    "## 模型1：策略推荐"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "15005e59-6e70-46a3-bd60-69f47a139b28",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:33:39.169112Z",
     "iopub.status.busy": "2022-07-13T08:33:39.168309Z",
     "iopub.status.idle": "2022-07-13T08:33:55.471425Z",
     "shell.execute_reply": "2022-07-13T08:33:55.470781Z",
     "shell.execute_reply.started": "2022-07-13T08:33:39.169047Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  6%|▌         | 10000/170909 [00:14<03:55, 683.56it/s]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.39836016398360163"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mrr = []\n",
    "\n",
    "for did, did_seq_train in tqdm(seq_local_train.groupby('did')):\n",
    "    last1_vid = did_seq_train['vid'].iloc[-1]\n",
    "    hist_vid = list(did_seq_train['vid'])\n",
    "    cid_nunique = did_seq_train['cid'].nunique()\n",
    "\n",
    "    candidate_vid = []\n",
    "    if last1_vid in vid_next1:\n",
    "        candidate_vid = vid_next1[last1_vid][:]\n",
    "    else:\n",
    "        candidate_vid = []\n",
    "\n",
    "    if len(candidate_vid) == 0:\n",
    "        w2v_sim = vid_w2v.wv.most_similar(hist_vid[-1])\n",
    "        w2v_sim = [x[0] for x in w2v_sim][:3]\n",
    "        pred_vid = w2v_sim + hot20\n",
    "    else:\n",
    "        pred_vid = candidate_vid\n",
    "        pred_vid = pred_vid + hot20\n",
    "\n",
    "    pred_vid = [x for x in pred_vid if x not in hist_vid]\n",
    "    pred_vid = list(dict.fromkeys(pred_vid))[:6]\n",
    "    try:\n",
    "        pred_result = pred_vid.index(seq_local_valid.loc[did]['vid'])\n",
    "        mrr.append(1/(pred_result + 1))\n",
    "    except:\n",
    "        mrr.append(0)\n",
    "\n",
    "    if len(mrr) > 10000:\n",
    "        break\n",
    "        \n",
    "#     break\n",
    "    \n",
    "np.mean(mrr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "cfd67a7d-042b-4aa5-b376-7d60ee67a098",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:34:12.150663Z",
     "iopub.status.busy": "2022-07-13T08:34:12.150078Z",
     "iopub.status.idle": "2022-07-13T08:37:32.781467Z",
     "shell.execute_reply": "2022-07-13T08:37:32.780825Z",
     "shell.execute_reply.started": "2022-07-13T08:34:12.150612Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 170909/170909 [03:20<00:00, 853.61it/s] \n"
     ]
    }
   ],
   "source": [
    "rule_submit = []\n",
    "for did, did_seq in tqdm(seq_vid_train.groupby('did'), total=seq_train['did'].nunique()):\n",
    "    last1_vid = did_seq['vid'].iloc[-1]\n",
    "    hist_vid = list(did_seq['vid'])\n",
    "    cid_nunique = did_seq['cid'].nunique()\n",
    "\n",
    "    pred_result = []\n",
    "    if last1_vid in vid_next1:\n",
    "        pred_result = vid_next1[last1_vid][:]\n",
    "\n",
    "    if len(pred_result) == 0:\n",
    "        w2v_sim = vid_w2v.wv.most_similar(hist_vid[-1])\n",
    "        w2v_sim = [x[0] for x in w2v_sim][:3]\n",
    "        pred_result = w2v_sim\n",
    "\n",
    "    pred_result += hot20\n",
    "    pred_result = list(dict.fromkeys(pred_result))\n",
    "    pred_result = [x for x in pred_result if x not in hist_vid]\n",
    "    rule_submit += [[did, x] for x in pred_result]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "5e0ca2bd-0648-4baf-9ba5-260a632aa6ce",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:37:32.783373Z",
     "iopub.status.busy": "2022-07-13T08:37:32.782860Z",
     "iopub.status.idle": "2022-07-13T08:37:34.432883Z",
     "shell.execute_reply": "2022-07-13T08:37:34.432300Z",
     "shell.execute_reply.started": "2022-07-13T08:37:32.783339Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "rule_submit2 = pd.DataFrame(rule_submit)\n",
    "rule_submit2 = rule_submit2[rule_submit2[1].isin(candidate_items['vid'])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "90e76c5f-3adf-408f-878f-3254e4677a1b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:37:34.434266Z",
     "iopub.status.busy": "2022-07-13T08:37:34.434090Z",
     "iopub.status.idle": "2022-07-13T08:37:36.652024Z",
     "shell.execute_reply": "2022-07-13T08:37:36.650816Z",
     "shell.execute_reply.started": "2022-07-13T08:37:34.434250Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "rule_submit2 = rule_submit2.drop_duplicates(keep='first')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "42df284e-1973-4b24-8a87-7f8b0e1bec34",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:37:36.653202Z",
     "iopub.status.busy": "2022-07-13T08:37:36.652963Z",
     "iopub.status.idle": "2022-07-13T08:37:40.096615Z",
     "shell.execute_reply": "2022-07-13T08:37:40.095884Z",
     "shell.execute_reply.started": "2022-07-13T08:37:36.653186Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "rule_submit2 = rule_submit2.groupby(0).head(6)\n",
    "rule_submit2.columns = ['did', 'vid']\n",
    "rule_submit2['rank'] = rule_submit2.groupby('did').cumcount()+1\n",
    "rule_submit2.to_csv('res.csv', index=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "7a9dc8a4-e5bc-466b-9dc5-eeda5808e119",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:37:40.097795Z",
     "iopub.status.busy": "2022-07-13T08:37:40.097476Z",
     "iopub.status.idle": "2022-07-13T08:37:40.307418Z",
     "shell.execute_reply": "2022-07-13T08:37:40.306973Z",
     "shell.execute_reply.started": "2022-07-13T08:37:40.097776Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "d2964663b3b8aa06c6fe1a66838e8ef8    6\n",
       "4f9a18d61efe3223afbb21669bd3eddd    6\n",
       "281dc7301c214824659e8d259b9b6957    6\n",
       "52c109cd66f568e4a42b3d91bb834d03    6\n",
       "baa9bf96009c26a03e4db8ede1f3647c    6\n",
       "                                   ..\n",
       "70926cc3f29e3b79084e813bd244a3ca    6\n",
       "6d91285c8c0e12045d8301061f1e8c6c    6\n",
       "9c0f6e108fec513f7a5488555a8075ec    6\n",
       "7f810619191c68e504b029a2b4ce6de0    6\n",
       "cc6a793a9324a905c302b4b04484335b    6\n",
       "Name: did, Length: 170909, dtype: int64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rule_submit2['did'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "d1b56744-bd27-4ef6-8515-84885d145325",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:37:40.308360Z",
     "iopub.status.busy": "2022-07-13T08:37:40.308138Z",
     "iopub.status.idle": "2022-07-13T08:37:40.684065Z",
     "shell.execute_reply": "2022-07-13T08:37:40.682512Z",
     "shell.execute_reply.started": "2022-07-13T08:37:40.308344Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1025455 res.csv\n"
     ]
    }
   ],
   "source": [
    "!wc -l res.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "bc9fe07d-619f-4f33-bfb2-f185c89179bf",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-07-13T08:37:40.686837Z",
     "iopub.status.busy": "2022-07-13T08:37:40.686298Z",
     "iopub.status.idle": "2022-07-13T08:37:42.202351Z",
     "shell.execute_reply": "2022-07-13T08:37:42.200885Z",
     "shell.execute_reply.started": "2022-07-13T08:37:40.686788Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "updating: res.csv (deflated 84%)\n"
     ]
    }
   ],
   "source": [
    "!zip -r res.csv.zip res.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "973eb297-0daf-4e83-b690-43b5a986496e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a68d4d06-e454-48d8-81f6-ff033c4ccee4",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.6 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "toc-autonumbering": true
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
